Finite-sample analysis of least-squares policy iteration
نویسندگان
چکیده
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the least-squares temporal-difference (LSTD) learning method, and report finite-sample analysis for this algorithm. To do so, we first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is β-mixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm.
منابع مشابه
Convergence Proofs of Least Squares Policy Iteration Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems
Most of the current theory for dynamic programming algorithms focuses on finite state, finite action Markov decision problems, with a paucity of theory for the convergence of approximation algorithms with continuous states. In this paper we propose a policy iteration algorithm for infinite-horizon Markov decision problems where the state and action spaces are continuous and the expectation cann...
متن کاملRegularized Policy Iteration
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...
متن کاملLeast-squares methods for policy iteration
Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, po...
متن کاملLearning an Exercise Policy for American Options from Real Data
We study approaches to learning an exercise policy for American options directly from real data. We investigate an approximate policy iteration method, namely, least squares policy iteration (LSPI), for the problem of pricing American options. We also extend the standard least squares Monte Carlo (LSM) method of Longstaff and Schwartz, by composing sample paths from real data. We test the perfo...
متن کاملGlobal least squares solution of matrix equation $sum_{j=1}^s A_jX_jB_j = E$
In this paper, an iterative method is proposed for solving matrix equation $sum_{j=1}^s A_jX_jB_j = E$. This method is based on the global least squares (GL-LSQR) method for solving the linear system of equations with the multiple right hand sides. For applying the GL-LSQR algorithm to solve the above matrix equation, a new linear operator, its adjoint and a new inner product are dened. It is p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 13 شماره
صفحات -
تاریخ انتشار 2012